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Abstract 

Errors in the data and the forward operator of an inverse problem 
can be handily modelled using partial order in Banach lattices. We 
present some existing results of the theory of regularisation in this 
novel framework, where errors are represented as bounds by means 
of the appropriate partial order. 

We apply the theory to diffusion tensor imaging (DTI), where cor¬ 
rect noise modelling is challenging: it involves the Rician distribution 
and the nonlinear Stejskal-Tanner equation. Linearisation of the latter 
in the statistical framework would complicate the noise model even 
further. We avoid this using the error bounds approach, which pre¬ 
serves simple error structure under monotone transformations. 


1 Introduction 

Often in inverse problems, we have only very rough knowledge of noise 
models, or the exact model is too difficult to realise in a numerical recon¬ 
struction method. The data may also contain process artefacts from black 
box devices [41]. Partial order in Banach lattices has therefore recently been 
investigated in [30, 32, 31] as a less-assuming error modelling approach for 
inverse problems. This framework allows the representation of errors in 
the data as well as in the forward operator of an inverse problem by means 
of order intervals (i.e., lower and upper bounds by means of appropriate 
partial orders). An important advantage of this approach vs. statistical 
noise modelling is that deterministic error bounds preserve their simple 
structure under monotone transformations. 
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We apply partial order in Banach lattices to diffusion tensor imaging 
(DTI). We will in due course explain the diffusion tensor imaging progress, 
as well as the theory of inverse problems in Banach lattices, but start by 
introducing our model 

min R{u) subject to > 0, 

^ Aju ^ C^-a.e. on Q, (j = 1,..., N). 

That is, we want to find a field of symmetric 2-tensors u : Q ^ Sym^(M^) 
on the domain 12 c M^, minimising the value of the regulariser R on the 
feasible set. The tensor field u is our unknown image. It is subject a pos¬ 
itivity constraint, as well as partial order constraints imposed through the 
operators [Aju]{x) := — {bj^ u{x)bj), and the upper and lower bounds gj := 
log(s^/5Q) and g'j := log(s^/sQ). These model, in terms of error intervals 
after logarithmic transformation, the Stejskal-Tanner equation 

Sj{x) = so{x)e^p{-{bj,u{x)bj)), {j = 1,...,A^), (1.1) 

central to the diffusion tensor imaging process. 

To shed more light on u and the equation (1.1), let us briefly outline 
the diffusion tensor imaging process. As a first step towards DTI, dif¬ 
fusion weighted magnetic resonance imaging (DWI) is performed. This 
process measures the anisotropic diffusion of water molecules. To cap¬ 
ture the diffusion information, the magnetic resonance images have to be 
measured with diffusion sensitising gradients in multiple directions. These 
are the different 6/s in (1.1). Eventually, multiple DWI images {sj} are re¬ 
lated through the Stejskal-Tanner equation (1.1) to the symmetric positive- 
definite diffusion-tensor field u : ^ ^ Sym^(M^) [4, 27]. At each point 
X G fl, the tensor u{x) is the covariance matrix of a normal distribution for 
the probability of water diffusing in different spatial directions. 

The fact that multiple 6/s are needed to recover u, leads to very long 
acquisition times, even with ultra fast sequences like echo planar imaging 
(EPI). Therefore, DTI is inherently a low-resolution and low-SNR method. 
In theory, the amplitude DWI images exhibit Rician noise [23]. However, 
as the histogram of an in vivo measurement in Eigure 1 illustrates, this may 
not be the case for practical data sets from black-box devices. Moreover, the 
DWI process is prone to eddy-current distortions [50], and due to the slow¬ 
ness of it, it is very sensitive to patient motion [26, 1]. We therefore have 
to use techniques that remove these artefacts in solving for u{x). We also 
need to ensure the positivity u, as non-positive-definite diffusion tensor are 
non-physical. One proposed approach for the satisfaction of this constraint 
is that of log-Euclidean metrics [3]. This approach has several theoretically 
desirable aspects, but some practical shortcomings [54]. Special Perona- 
Malik type constructions on Riemannian manifolds can also be used to 
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(a) Slice of a real (b) 50-bin histogram of noise (c) 50-bin histogram after 
MRI measurement estimated from background eddy-current correction 

with FSL 


Figure 1: The noise in the absolute values of complex MRI data should be 
Rician. FFere we have taken a 50-bin histogram of the noise in real data. 
This divides the pixels into bins of 50 different noise levels. FFowever, we 
only find approximately 10 noise levels to have non-zero pixel count. As 
the Rician distribution is continuous, we see that the noise cannot be Rician, 
some bins of the 50-bin histogram being empty. The measurement setup of 
the data used here is described in Section 5.3. 


maintain the structure of the tensor field [14, 51]. Such anisotropic diffu¬ 
sion is however severely ill-posed [?]. Recently manifold-valued discrete- 
domain total variation models have also been applied to diffusion tensor 
imaging [6]. 

Our approach is also in the total variation family, first considered for 
diffusion tensor imaging in [45]. Namely, we follow up on the work in [54, 
56, 55, 52] on the application of total generalised variation regularisation 
[9] to DTI. We should note that in all of these works the fidelity function 
was the ROF-type [42] Lp‘ fidelity. This would only be correct, according 
to the assumption that noise of MRI measurements is Gaussian, if we had 
access to the original complex k-space MRI data. The noise of the inverse 
Fourier-transformed magnitude data Sj, that we have in practice access to, 
is however Rician under the Gaussian assumption on the original complex 
data [23]. This is not modelled by the L? fidelity. 

Numerical implementation of Rician noise modelling has been studied 
in [37, 21]. As already discussed, in this work, we take the other direc¬ 
tion. Instead of modelling the errors in a statistically accurate fashion, not 
assuming to know an exact noise model, we represent them by means of 
pointwise bounds. The details of the model are presented in Section 3. We 
study the practical performance in Section 5 using the numerical method 
presented in Section 4. First we however start with the general error mod¬ 
elling theory in Section 2. Readers who are not familiar with notation for 
Banach lattices or symmetric tensor fields are advised to start with the Ap¬ 
pendix, where we introduce our mathematical notation and techniques. 
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2 Deterministic error modelling 

2.1 Mathematical basis 

We now briefly outline the theoretical framework [30] that is the basis for 
our approach. Consider a linear operator equation 


Au = /, u ^ f ^ 


( 2 . 1 ) 


where U and F are Banach lattices. A : U ^ F is bl regular injective op¬ 
erator. The inaccuracies in the right-hand side / and the operator A are 
represented as bounds by means of appropriate partial orders, i.e. 


f,r-. 

A\A^: a} a A^ 


( 2 . 2 ) 


where the symbol stands for the partial order in F and the 

partial order for regular operators induced by partial orders in U and F. 
Further, we will drop the subscripts at inequality signs where it will not 
cause confusion. 

The exact right-hand side / and operator A are not available. Given 
the approximate data (/^, /^, A\A^), we need to find an approximate solu¬ 
tion u that converges to the exact solution u as the inaccuracies in the data 
diminish. This statement needs to be formalised. We consider monotone 
convergent sequences of lower and upper bounds 


Jn‘ JnFl ^ Jn'i ' ^n+1 ^ 



(2.3) 


We are looking for an approximate solution Un such that \\un — u\\u ^ 0 as 
n ^ oo. 

Let us ask the following question. What are the elements u e U that 
could have produced data within the tolerances (2.3)? Obviously, the exact 
solution is one of such elements. Let us call the set containing all such 
elements the feasible set Un CU. 

Suppose that we know a priori that the exact solution is positive (by 
means of the appropriate partial order in U). Then it is easy to verify that 
the following inequalities hold for all n G N 



This observation motivates our choice of the feasible set: 


Un = {u ^ U: u ^jj 0 , A^ 
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It is clear that the exact solution u belongs to the sets Un for all n G N. 
Our goal is to define a rule that will choose for any n an element Un of 
the set Un such that the sequence Un G Un will strongly converge to the 
exact solution u. We do so by minimising an appropriate regularisation 
functional R{u) on Un- 

Un = arg min R{u). (2.4) 

U^Un 

This method, in fact, is a lattice analogue of the well-known residual 
method [22,49]. The convergence result is as follows [30]. 

Theorem 2.1. Suppose that 

1. R{u) is bounded from below on U, 

2. R{u) is lower semi-continuous, 

3. level sets {u: R{u) ^ C} (C = const) are sequentially compact in U (in 
the strong topology induced by the norm). 

Then the sequence defined in (2.4) strongly converges to the exact solution u and 
'l2'(^Uji') —y 72(‘u). 

Examples of regularisation functionals that satisfy the conditions of The¬ 
orem 2.1 are as follows. Total Variation in where 12 is a subset of 

W^, assures strong convergence in U, given that the L^-norm of the solu¬ 
tion is bounded. The Sobolev norm ||rt||^yi,i 3 (Q) yields strong convergence 
in the spaces U’(Tl), where p ^ 1, q > The latter fact follows from 

the compact embedding of the corresponding Sobolev W^’^(Tl) space into 
LP(n) [17]. 

However, the assumption that the sets {u: R(u) ^ C} are strong com¬ 
pacts in U is quite strong. It can be replaced by the assumption of weak 
compactness, provided that the regularisation functional possesses the so- 
called Radon-Riesz property. 

Definition 2.1. A functional F: U has the Radon-Riesz property (some¬ 
times referred to as the iJ-property), if for any sequence Un weak con¬ 
vergence Un uo and simultaneous convergence of the values F(un) —> 
F{uo) imply strong convergence Un —> uq. 

Theorem 2.2. Suppose that 

1. R{u) is bounded from below on U, 

2. R(u) is weakly lower semi-continuous, 

3. level sets R(u) ^ C (C = const) are weakly sequentially compact in U, 

4. R{u) possesses the Radon-Riesz property. 
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Then the sequence defined in (2.4) strongly converges to the exact solution u and 
TZ(un) —>■ TZ(u). 


It is easy to verify that the norm in any Hilbert space possesses the 
Radon-Riesz property. Moreover, this holds for the norm in any reflexive 
Banach space [17]. 

As we explain in the Appendix, the spaces Sym^(]R"*)) are not 

Banach lattices, therefore. Theorems 2.1 and 2.2 cannot be applied directly. 
Further theoretical work will be undertaken to extend the framework to the 
non-lattice case. For the moment, however, we will prove that if there are 
no errors in the operator A in (2.1), the requirement that the solution space 
U is a lattice can be dropped. 

Theorem 2.3. Let U he a Banach space, and Fbea Banach lattice. Let the operator 
A in (2.1) he a linear, continuous and injective operator. Let fj, and /“ he sequences 
of lower and upper hounds for the right-hand side defined in (2.3), and suppose that 
there are no errors in the operator A. Let us redefine the feasible set in the following 
way 

Un = {ueU-. fli^FAui^Fffl}. 

Suppose also that the regulariser R(x) satisfies conditions of either Theorem 2.1 or 
Theorem 2.2. Then the sequence defined in (2.4) strongly converges to the exact 
solution u and R(un) —>■ R(u). 


Proof. Let us define an approximate right-hand side and its approximation 
error in the following way 


hn = 


fU , f. 

J n ' J 1 


Sn = 


W-fnW 


One can easily verify, that the inequality ||/ — / 5 „ |1 ^ Sn holds. Indeed, we 
have 

£U _ fl 

f - f. < - f, = An _ An 

J J On ^ J IT J On 2 ’ 

TU _ tI 

(£ f \ ^ f fl _ J n J n 

\J JSn) ^ JSn Jn ~ 2 ’ 

TU _ tI 

1/ -fSn\ = {f- fSn) V (-(/ - fSn)) ^ ’ 


11/ - fSnW ^ 


f I 

ln\ 


The first two inequalities are consequences of the conditions (2.3), the third 
one holds by the definition of supremum and the equality |/| = /v(-/) 
that holds for all / G F, and the last inequality is due to the monotonicity 
of the norm in a Banach lattice. 

Similarly, one can show that for any u G we have 


\\Au - fsj ^ 5n. 
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Therefore, the inclusion Un C {u: \\Au — fs^W ^ ^n} holds. 

Now we will proceed with the proof of convergence \\un — u\\ 0. 

Will prove it for the case when the regulariser R{u) satisfies conditions of 
Theorem 2.1. Suppose that the sequence Un does not converge to the exact 
solution u. Then it contains a subsequence Un^ such that \\unj. — u\\ ^ e for 
any A: G N and some fixed > 0. 

Since the inclusion u ^ Un holds for all n G N, we have R{un) < R{u) 
for all n G N. Since the level set {u\ R{u) < R{u)} is a compact set by as¬ 
sumptions of the theorem, the sequence Unj. contains a strongly convergent 
subsequence. With no loss of generality, let us assume that Un^ We 

will now show that = u. Indeed, we have 

\\AUnj, - Au\\ ^ \\AUn^ - fSnW + 11/ “ fSnW ^ ^ 0- 

On the other hand, we have 

~ Au\\ ~ Au\\ 

due to continuity of A and || • ||. Therefore, Auq = Ail and uq = u, since A 
is an injective operator. By contradiction, we get ||?i^ — i^|| ^0. 

Finally, since the regulariser R(u) is lower semi-continuous, we get that 
lim inf R{un) = R{u). However, for any n we have R{un) ^ R{u), therefore, 
we get the convergence R{un) R{u) as n ^ oo. □ 

2.2 Philosophical discussion and statistical interpretation 

In practice, our data is discrete. So let us momentarily switch to measure¬ 
ments / = (/^,..., /^) G of a true data / G If we actually knew 
the pointwise noise model of the data, then one way to obtain potentially 
useful upper and lower bounds for the deterministic model is by means of 
statistical interval estimates: confidence intervals. Roughly, the idea is to 
find for each true signal / individual random upper and lower bounds 
and f such that 

p{r <f <h = 1-0- 

If /“ and r are computed based on multiple experiments (i.e., multiple 
noisy samples /i,..., fm, of the true data /), the interval [/“’*, /^’*] will con- 
verge in probability to the true data f\ as the number of experiments m 
increases. Thus we obtain a probabilistic version of the convergences in 
(2.3). 

Let us try to see, how such intervals might work in practice. For the 
purpose of the present discussion, assume that the noise is additive and 
normal-distributed with variance a and zero mean—an assumption that 
does not hold in practice, as we have already seen in Figure 1, but will 
suffice for the next thought experiments. That is, fj — f A nj for the noise 
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Uj. Let the sample mean of be fm = ■ ■ ■ Im)> pointwise 

sample variance a = , a'^). The product of the pointwise confidence 

intervals with confidence 1 — 0 is [15,46] 


n 



for $ the cumulative normal distribution function. For 6 — 0.05, i.e., the 
95% confidence interval, $“^(0.05/2) = 1.96. Now, the probability that Ii 
covers the true /Ms 1 — 6, e.g. 95%. If we have only a single sample m = 1, 
the intervals stay large, but the joint probability, (1 — 9)^ goes to zero as n 
increases. As an example, for a rather typical single 128 x 128 slice of a DTI 
measurement, the probability that exactly (j) = 5% (to the closest discrete 
value possible) of the 1 — 0 = 95% confidence intervals do not cover the 
true parameter would be about 1.4%, or 



The probability of at least 0 = 5% of the pointwise 95% confidence intervals 
not covering the true parameter is in this setting approximately 49%. This 
can be verified by summing the above estimates over m = [(/)n],..., n. 

In summary, unless 9 simultaneously goes to 1, the product intervals are 
very unlikely to cover the true parameter. Based on a single experiment, 
the deterministic approach as interpreted statistically through confidence 
intervals, is therefore very likely to fail to discover the true solution as the 
data size n increases unless the pointwise confidence is very low. But, if we 
let the pointwise confidences be arbitrarily high, such that the intervals are 
very large, the discovered solution in our applications of interest would be 
just a constant! 

Asymptotically, the situation is more encouraging. Indeed, if we could 
perform more experiments to compute the confidence intervals, then for 
any fixed n and 9, it is easy to see that the solution of the "deterministic" 
error model is an asymptotically consistent and hence asymptotically unbi¬ 
ased estimator of the true /. That is, the estimates converge in probability 
to / as the experiment count m increases. Indeed, the error-bounds based 
estimator /^, based on m experiments, by definition satisfies fm G IlILi 
Therefore, we have 

P(|/^ — /^| > e for some i) = 0 whenever m > • 

Thus / —> / in probability. Since by the law of large numbers also fm —^ f, 
this proves the claim, and to some extent justifies our approach from the 
statistical viewpoint. 
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It should be noted that this is roughly the most that has previously been 
known of the maximum a posteriori estimate (MAP), corresponding to the 
Tikhonov models ^ 

min -||/ — + aR{u). 

In particular, the MAP is not the Bayes estimator for the typical squared 
cost functional. This means that it does not minimise / k^E[||/ — / |p]. The 
minimiser in this case is the conditional mean (CM) estimate, which is why 
it has been preferred by Bayesian statisticians despite its increased compu¬ 
tational cost. The MAP estimate is merely an asymptotic Bayes estimator 
for the uniform cost function. In a very recent work [12], it has however 
been proved that the MAP estimate is the Bayes estimator for certain Breg- 
man distances. One possible critique of the result is that these distances are 
not universal and do depend on the regulariser i?, unlike the squared dis¬ 
tance for CM. The CM estimate however has other problems in the setting 
of total variation and its discretisation [34, 33]. 

3 Application to diffusion tensor imaging 

We now build our model for applying the deterministic error modelling 
theory to diffusion tensor imaging. We start by building our forward model 
based on the Stejskal-Tanner equation, and then briefly introduce the regu- 
larisers we use. 

3.1 The forward model 

For u : ^ ^ Sym^(M^), ft c M^, a mapping from to symmetric second 
order tensors, let us introduce non-linear operators Tj, defined by 

[Tj{u)]{x) := so{x)ex.p{-{bj,u{x)bj)), (j = 1,...,N). 

Their role is to model the so-called Stejskal-Tanner equation [4] 

Sj{x) = so{x)eyip{-{bj,u{x)bj)), {j = l,...,N). (3.1) 

Each tensor u{x) models the covariance of a Gaussian probability distri¬ 
bution at X for the diffusion of water molecules. The data Sj G Lp‘{ft), 
{j = 1,..., N), are the diffusion-weighted MRI images. Each of them is 
obtained by performing the MRI scan with a different non-zero diffusion 
sensitising gradient hj, while sq is obtained with a zero gradient. After cor¬ 
recting the original /c-space data for coil sensitivities, each Sj is assumed 
real. As a consequence, any measurement sj of sj has—in theory—Rician 
noise distribution [23]. 
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Our goal is to reconstruct u with simultaneous denoising. Following 
[52,29], we consider using a suitable regulariser R the Tikhonov model 

^ 1 

(3-2) 

u^O —: z 

The constraint ^ 0 is to be understood in the sense that u{x) is positive 
semidefinite for >C^-a.e. x ^ Q (see Appendix for more details). Due to the 
Rician noise of sj, the Gaussian noise model implied by the L^-norm in (3.2) 
is not entirely correct. However, in some cases the model may be accu¬ 
rate enough, as for suitable parameters the Rician distribution is not too far 
from a Gaussian distribution. If one were to model the problem correctly, 
one should either modify the fidelity term to model Rician noise, or include 
the (unit magnitude complex number) coil sensitivities in the model. The 
Rician noise model is highly nonlinear due to the Bessel functional loga¬ 
rithms involved. Its approximations have been studied in [5, 21, 37] for 
single MR images and DTI. Coil sensitivities could be included either by 
knowing them in advance, or by simultaneous estimation as in [28]. Ei¬ 
ther way, significant complexity is introduced into the model, and for the 
present work, we are content with the simple model. 

We may also consider, as is often the case, and as was done with TGV 
in [54], the linearised model 

min \\f — u\\‘^ -\- aR{u)^ (3.3) 

where, for each x G ri, f{x) is solved by regression for u{x) from the sys¬ 
tem of equations (3.1) with Sj{x) — Sj{x). Further, as in [55], we may also 
consider 

^ 1 

min y] -|bj - Ajuf + aR{u), (3.4) 

u>0 —: Z 

with. [Aju]{x) := — {bj^u{x)bj), and gj{x) := log{sj{x)/so{x)). In both of 
these linearised models, the assumption of Gaussian noise is in principle 
even more remote from the truth than in the nonlinear model (3.2). We will 
employ (3.3) and (3.2) as benchmark models. 

We want to further simplify the model, and forgo with accurate noise 
modelling. After all, we often do not know the real noise model for the data 
available in practice. It can be corrupted by process artefacts from black¬ 
box algorithms in the MRI devices. This problem of black box devices has 
been discussed extensively in [41], in the context of Computed Tomogra¬ 
phy. Moreover, as we have discussed above, even without such artefacts, 
the correct model may be difficult to realise numerically. So we might be 
best off choosing the least assuming model of all - that of error bounds. 
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This is what we propose in the reconstruction model 

min R{u) subject to (3.5) 

“ 5 ' ^ Aju ^ g^, C^-a.e., (j = 1,..., N). 

Here := log(sys^) and gj := log(s^/s[,), g'j,gJ G are our upper 

and lower bounds on gj that we derive from the data. 


3.2 Choice of the regulariser R 


A prototypical regulariser in image processing is the total variation, first 
studied in this context in [42]. It can be defined for a symmetric tensor field 

u e L^{n-Sym'^{W^)) as 




:= sup ly (div</)(a:),M(a:)) da: 


0 € (7^(12; Sym^+i(R"»)) 
sup^ \\(f){x)\\F < 1 


Observe that for scalar or vector fields, i.e., the cases /c = 0,1, we have 
Sym^(M^) = = M, and Sym^(M^) = Therefore, for 

scalars in particular, this gives the usual isotropic total variation 


TY{u) = ||HM||;n(n)). 

Total generalised variation was introduced in [9] as a higher-order ex¬ 
tension of TV. Following [54], the second-order variant may be defined 
using the differentiation cascade formulation for symmetric tensor fields 
u G Sym^(M’^)) as the marginal 

:= «;) | w G L\n-Sym’^+\R^))} (3.6) 

for 


O'WEu ^llF,A^(0;Sym^+2(Mrn))• 

It turns out that the standard BV norm 


ll'^llBV(0;Sym^(M"^)) •“ ll'^llLi(0;Sym^(M"^)) + 

and the "BGV norm" [9] 

ll^ir ll'^llLi(0;Sym^(M"^)) 

are topologically equivalent norms [10,11] on BV(fl; Sym^(M’^)), yielding 
the same convergence results for TGV regularisation as for TV regularisa- 
tion. The geometrical regularisation behaviour is however different, and 
TGV tends to avoid the staircasing observed in TV regularisation. 

Regarding topologies, we say that a sequence {u^}m BV(fl;Sym^(M^)) 
converges weakly"^ to u, if u'^ ^ u strongly in L^, and Eu^ ^ Eu weakly"^ as 
Radon measures [2,48,54]. The latter is characterised as J^(div(/)(x), dx ^ 
J^(div(/)(x), i/(x)) dx for all (j) G Sym^^^(M’^)). 
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3.3 Compact subspaces 

Now, for a weak* lower semi-continuous seminorm R on BV(fl; Sym^ (IR™)), 
let us set 


BVo,i?(f^;Sym'^(M™)) := BV(fl; Sym^(M™))/keri?. 

That is, we identify elements u,u e BV(fl; Sym^(R"*)), such that R{u—u) = 
0. Now i? is a norm on BVo,i?(f^; Sym^(]R™)); compare, e.g., [39] for the case 
of i? = TV. 

Suppose 

ii^^ir := ii^^iili(o)+-RH 

is a norm on BV(f2; Sym^(R"*)), equivalent to the standard norm. If also 
the R-Sobolev-Korn-Poincare inequality 

~ ^ CR{u) (3.7) 

holds, we may then bound 


inf 

R{v)=0 


= “ ^’IIli(o)+R(« - v)) 


< C'{l + C)R{u). 


By the weak* lower semicontinuity of the BV-norm, and the weak* com¬ 
pactness of the unit ball in BV(fl; Sym^(M"^))—we refer to [2] for these and 
other basic properties of BV-spaces—we may thus find a representative u 
in the BVo,i?(fI; Sym^(M™)) equivalence class of u, satisfying 

ll“llBV(r2;Sym''(R”»)) < C' (1 + C)R{u). 

Again using the weak* compactness of the unit ball in BV(n; Sym^(]R"^)), 
and the weak* lower semicontinuity of R, it follows that the sets 

lev„ R-.= {ue BVo,r( 52; Sym''(R"*)) | R{u) < a}, (a > 0), 

are weak* compact in BVo,i?(fI; Sym^(M™)), in the topology inherited form 
BV(f2; Sym^(R"*)). Consequently, they are strongly compact subsets of 
Sym^(]R"^)). This feature is crucial for the application of the regu- 
larisation theory in Banach lattices above. 

On a connected domain fl, in particular 


BVo,Tv(f^) - 



That is, the space consists of zero-mean functions. Then u i-> ||i9M||x(a;Rm) 
is a norm on BVo,Tv(f^) [39], and this space is weak* compact. In particular, 
the sets leva TV are compact in L^(f2). 
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More generally, we know from [8] that on a connected domain fl, ker TV 
consists of Sym^(]R™)-valued polynomials of maximal degree k. By exten¬ 
sion, kerTGV^ consists of Sym^(]R"*)-valued polynomials of maximal de¬ 
gree k + 1. In both cases, (3.7), weak* lower semicontinuity of R, and the 
equivalence of II • ||'to || • hold by the results in [8, 11,48]. 

Therefore, we have proved the following. 

Lemma 3.1. Let Ll c and k > 0. Then the sets lev^ TV and lev^ TGV^ are 
weak* compact in BV(f2; Sym^(R"*)) and strongly compact in L^(f2; Sym^(R"*)). 

Now, in the above cases, ker R is finite-dimensional, and we may write 

BV(fl; Sym'=(M"*)) ~ BVo,i?(fl; Sym'=(M"*)) © ker i?. 

Denoting by 

Bx{r) := {x e V I ||a;|| < r}, 

the closed ball of radius r in a normed space X, we obtain by the finite- 
dimensionality of ker R the following result. 

Proposition 3.1. Let Ll c MX and k >0. Pick a > 0. Then the sets 

V := leVaRQ BkerR(a) 

for R = TV and R = TGV^ are weak* compact in BV(f2; Sym^(R"*)) and 
strongly compact in Sym^(]R™)). 

The next result summarises Theorem 2.3 and Proposition 3.1. 

Theorem 3.1. With U = Sym^(]R™')), let the operator A : U ^ F be 

linear, continuous and injective. Let and /“ be sequences of lower and upper 
bounds for the right-hand such that 

/ I . Jtl \ Jtl ru . £U ^ £U 

n' Jn-\-l ^ Jnt Jn' Jnt 

/n^/^/n> ll/n-/nll^0 aSn^OO. 

Supposing that there are no errors in the operator A and the exact solution u exists, 
define the feasible set as follows 

Un = {ueU: fl^^pAu^Ffn- 

Decomposing u e U as u = uq u-^ with u-^ G ker R, suppose 

U G Un => 11^"*" II < O (3-8) 

for some constant a > 0, then for R = TV and R = TGV^, the sequence 

Un = arg min R{u) 

ueUn 

converges strongly in L^{Ll-,Synf{M^)) to the exact solution u and R{un) —>■ 
R{u). 
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Proof. With the decomposition Un = + uf, where uf G ker i?, we have 

uo^n ^ leva R for suitably large a > 0 through 

R{uQ^ri) = R{un) = min R{u) < R{u). 

u'eUn 

The assumption (3.8) bounds ||?i^|| < a. Thus Un ^ V for V as in Propo¬ 
sition 3.1. The proposition thus implies the necessary compactness inU = 
(fl; Sym^(M^)) for the application of Theorem 2.3. □ 

Remark 3.1. The condition (3.8) simply says for R = TV that the data has to 
bound the solution in mean. This is very reasonable to expect for practical 
data; anything else would be very non-degenerate. For R = TGV^ we 
also need that the data bounds the entire affine part of the solution. Again, 
this is very likely for real data. Indeed, in DTI practice, with at least 6 
independent diffusion sensiting gradients, A is an invertible or even over¬ 
determined linear operator. In that typical case, the bounds and ff will 
be translated into Un being a bounded set. 

4 Solving the optimisation problem 

4.1 The Chambolle-Pock method 

The Chambolle-Pock algorithm is an inertial primal-dual backward-back- 
ward splitting method, classified in [19] as the modified primal-dual hybrid 
gradient method (PDHGM). It solves the minimax problem 

min max G{x) + {Kx^y) — ^"*(2/), (4.1) 

X y 

where G : X ^ R and F* : V ^ M are convex, proper, lower semicon- 
tinuous functionals on (finite-dimensional) Hilbert spaces X and Y. The 
operator K : X ^ Y is linear, although an extension of the method to non¬ 
linear K has recently been derived [52]. The PDHGM can also be seen as a 
preconditioned ADMM (alternating directions method of multipliers); we 
refer to [19, 44, 53] for reviews of optimisation methods popular in image 
processing. For step sizes r, n > 0, and an over-relaxation parameter cj > 0, 
each iteration of the algorithm consists of the updates 

Ui+i := (/ + TdG)~^{ui - TK*yi), (4.2a) 

Ui+i := Ui+i + Lo{ui+i - Ui), (4.2b) 

Vi+i := (/ + adF*)-\yi + aKui+i). (4.2c) 

We should remark that the order of the primal (u) and dual (y) updates 
here is reversed from the original presentation in [13]. The reason is that 
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when reordered, the updates can, as discovered in [25], be easily written in 
a proximal point form. 

The first and last update are the backward (proximal) steps for the pri¬ 
mal (x) and dual (y) variables, respectively, keeping the other fixed. How¬ 
ever, the dual step includes some "inertia" or over-relaxation, as specified 
by the parameter cj. Usually cj = 1, which is required for convergence 
proofs of the method. If G or F* is uniformly convex, by smartly choos¬ 
ing for each iteration the step length parameters r, a, and the inertia uj, the 
method can be shown to have convergence rate 0(1/A^^). This is similar 
to Nesterov's optimal gradient method [40]. In the general case the rate 
is 0(1/A^). In practice the method produces visually pleasing solutions in 
rather few iterations, when applied to image processing problems. 

In implementation of the method, it is crucial that the resolvents (/ + 
TdG)~^ and (/ + can be computed quickly. We recall that they 

may be written as 



Usually in applications, these computations turn out to be simple projec¬ 
tions or linear operations - or the soft-thresholding operation for the Li- 
norm. 

As a further implementation note, since the algorithm (4.2) is formu¬ 
lated in Hilbert spaces (see however [?]), and we work in the Banach space 
BV(i2; Sym^(M^)), we have to discretise our problems before application of 
the algorithm. We do this by simple forward-differences discretisation of 
the operator E with cell width /i = 1 on a regular rectangular grid corre¬ 
sponding to the image voxels. 

4.2 Implementation of deterministic constraints 

We now reformulate the problem (3.5) of DTI imaging with deterministic 
error bounds in the form (4.1). Suppose we have some upper and lower 
bounds ^ sj ^ s'j on the DWI signals sj, (j = 0,..., A^). Then the 
bounds for gj = log(5j/so) are 


9^j = log(s'7so); 9j = log(s“/s[,), {j = l,...,N), (4.3) 


because gj is monotone in regards to Sj. We are thus trying to solve 


u= argmin R{u') 
u'€U=r\Ui 


(4.4) 


where 


W = {u: g] ^ AjU ^ g^} 
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For the ease of notation, we write 

9 (^1? • • • 5 Qn^ 5 and 

Au=[Aiui, Anun) , (4.5) 

so that the Stejskal-Tanner equation is satisfied with 

Au = g. 


The problem (4.4) may be rewritten as 

min Fo{Au) + R{u)^ 

9// 


for 


Fo{y) = Kg'" 

with 6{g^ ^ y < p") denoting the indicator function of the convex set 
{y-g^'^y^ g^}- Solving the conjugate 


^o(y) 


{g\y), y<o 

(y“,y), y>0. ’ 


and also writing 

R{u) = Ro{Kou) 

for some Rq and Kq, the problem can further be written in the saddle point 
form (4.1) with 

G{u) = 0, 



F*{y,^P) = F^{y) + R*o{^P). 

To apply algorithm (4.2), we need to compute the resolvents of Gq and i?g. 
For details regarding for R = TGV^^ and R = a TV in the discretised 
setting, we refer to [54, 55]; here it suffices to note that for R = a TV, we 
have Kq = aE and Ro{(f)) is the indicator function of the dual ball {0 | 
sup^ ||</)(a;)||ir < 1}. Thus the resolvent (I + rdi?o)~^ becomes a projection 
to the dual ball. The case of i? = TGV^^ is similar. For Fq we have 

(/ + TdF^)-^{y) = argminF(j'(y') + ^ ^ , 

y' 2r 

which resolves pointwise at each ^ E into the expression 

[{I + rdF^)-\ym) = S{y{0) 
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for 


S{yiO) 


'yiO-aHO^, y{0>9^i0^, 

' 0, 9^(0^ <yiO < 9''i0<^, 

.yiO - 9''i0^, yiO < 9\0^- 


Finally, we note that the saddle point system (4.1) has to have a solution 
for the Chambolle-Pock algorithm to converge. In our setting, in particular, 
we need to find error bounds and for which there exists a solution u 
to 


g^ ^ Au ^ g^. 


(4.6) 


If one uses at most six independent diffusion directions {N = 6), as we 
will, then, for any g, there in fact exists a solution to ^ = Au. The condition 
(4.6) becomes 9^ ^ g'^, immediately guaranteed through the monotonicity 
of (4.3), and the trivial conditions s^- ^ s'j. 

We are thus ready to apply the algorithm (4.2) to diffusion tensor imag¬ 
ing with deterministic error bounds. For the realisation of (4.2) for models 
(3.3), (3.4), and (3.2), we refer to [54, 55,56, 52]. 


5 Experimental results 

We now study the efficiency of the proposed reconstruction model in com¬ 
parison to the different L^-squared models, i.e., ones with Gaussian error 
assumption. This is based on a synthetic data, for which a ground-truth is 
available, as well as a real in vivo DTI data set. First we, however, have to 
describe in detail the procedure for obtaining upper and lower bounds for 
real data, when we do not know the true noise model, and are unable to 
perform multiple experiments as required by the theory of Section 2.2. 

5.1 Estimating lower and upper bounds from real data 

As we have already discussed, in practice the noise in the measurement sig¬ 
nals Sj is not Gaussian or Rician; in fact we do not know the true noise dis¬ 
tribution and other corruptions. Therefore, we have to estimate the noise 
distribution from the image background. To do this, we require a known 
correspondence between the measurement, the noise, and the true value. 
As we have no better assumptions available, the standard one that we use 
is that of additive noise. Continuing in the statistical setting of Section 2.2, 
we now describe the procedure, working on discrete images expressed as 
vectors f = Sj E for some fixed j G {0,1,...,A}. We use superscripts 
to denote the voxel indices, that is / = ..., 

In the i-th voxel, the measured value /* is the sum of the true value /* 
and additive noise A: 

f = f + 
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All are assumed independent and identically distributed (i.i.d.), but their 
distribution is unknown. If we did know the true underlying cumulative 
distribution function F of the noise, we could choose a confidence param¬ 
eter 9 G (0,1) and use the cumulative distribution function to calculate 
I'e 12 , 1 ^ 1-012 suchthat^ 

P{i^ 0/2 ^ v\- 0 / 2 ) = 1-6'- (5.1) 

Let us instead proceed non-parametrically, and divide the whole image 
into two groups of voxels - the ones belonging to the background region 
and the rest. For simplicity, let the indices i = 1,.. ., /c, (A: < n), stand for the 
background voxels. In this region, we have = 0 and P — F. Therefore, 
the background region provides us with a number of independent samples 
from the unknown distribution of v. The Dvoretzky-Kiefer-Wolfowitz in¬ 
equality [18, 38, 35] states that 

P(sup|F(t)-Ffc(t)| >e) <2e-2^^', 
for the empirical cumulative distribution function 

k 

1=1 

Therefore, computing z/ 51/2 and z^i_ 6 i /2 such that 

Pk{j^0/2 ^ ^ ^ 1 - 0 / 2 ) = 1-6', 

we also have 

Pkir + ^e/2 ^ r ^ r + ^i-e/2) = — 9. 

We may therefore use the values 

/M = fi _ fU,i ^ fi _ (5 2) 

as lower and upper bounds for the true values P outside the background 
region. 

The Dvoretzky-Kiefer-Wolfowitz inequality implies that the interval 
estimates converge to the true intervals, determined by (5.1), as the num¬ 
ber of background pixels k increases with the image size n. This proce¬ 
dure, with large k, will therefore provide an estimate of a single-experiment 
(m = 1 ) confidence interval for p. We note that this procedure will, how¬ 
ever, not yield the convergence of the interval estimate to the true 

data; for that we would need multiple experiments, i.e., multiple sample 
images (m > 1 ), not just agglomeration of the background voxels into a 
single noise distribution estimate. In practice, however, we can only afford 
a single experiment (m = 1 ), and cannot go to the limit. 

^Recall that for a random variable X with a cumulative distribution function F, the 
quantile function F~^ returns a number xe = F~^{0) such that P{X ^ xe) — 0. 
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5.2 Verification of the approach with synthetic data 

To verify the effectiveness of the considered approach and to compare it to 
the other models, we use synthetic data. For the ground-truth tensor field 
Ug.t. we take a helix region in a 3D box 100 x 100 x 30, and choose the tensor 
in each point inside the helix in such a way that the principal eigenvector 
coincides with the helix direction (Figure 2). The helix region is described 
by the following equations: 

X = (i? + rcos{9)) cos((/)), 
y — {R + rcos{9)) sin((/)), 

2 ; = rsin( 6 >) + 0 / 0 max, 

0 G [0, 0max] 5 r G [0, rmax] 5 ^ ^ [0 7 ]. 

The vector direction in every point coincides with helix direction: 

/—i?sin((/))\ 
r— Rcos{(p) 

\ 1 / 0 max / 

We take the parameters R = 0.3, (/)max = 47 r, rmax = 0.07 in this numerical 
experiment. 

We apply the forward operators Tj{ug,t), {j = 0 ,..., 6), to obtain the 
data Sj{x). We then add Rician noise to this data sj = sj + 6 with a = 2, 
which corresponds to PSNR 27dB. 

We apply several models for solving the inverse problem of reconstruct¬ 
ing u: the linear and non-linear approaches (3.3) and (3.2), and the con¬ 
strained problem (3.5). As the regulariser we use R = TGV^q where 
the choice [3 = 0.9n was made somewhat arbitrarily, however yielding 
good results for all the models. This is slightly lower than the range [ 1 ,1.5]n 
discovered in comprehensive experiments for other imaging modalities [7, 
16]. 

For the linear and non-linear models (3.3) and (3.2), respectively, the 
regularisation parameter a is chosen either by a version of the discrepancy 
principle for inconsistent problems [49] or optimally with regard to the || • 
\\f ^2 distance between the solution and the ground-truth. In case of the 
discrepancy principle, such an a was chosen that the following equality 
holds: 

Apia) = = 0 ( 5 . 3 ) 

j j 

We find a by solving this equation numerically using bisection method. 
We start by finding such 01,02 that Ap(oi) > 0 and Ap(o 2 ) < 0 . We 
calculate Ap(o 3 ), 03 = and depending on its sign replace either oi or 

02 with 03 . We repeat this procedure until the stopping criteria is reached. 
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Figure 2: Helix vector field for the principal eigenvector of the ground-truth 
tensor field 


As stopping criteria we use |/(n)| < c. We use r = 1.05, e = 0.01 for 
linear and r — 1.2, e — 0.0001 for non-linear L? solution. A value of r yield¬ 
ing a reasonable degree of smoothness has been chosen by trial and error, 
and is different for the non-linear model, reflecting a different non-linear 
objective in the discrepancy principle. For the constrained problem we cal¬ 
culate 9 = 90%, 95%, and 99% confidence intervals to generate the upper 
and lower bounds. We however digress a little bit from the approach of 
Section 2.2. Minding that we do not know the true underlying distribution, 
which fails to be Rician as illustrated in Figure 1, we do not use it to calcu¬ 
late the confidence intervals, but use the estimation procedure described in 
Section 5.1. We stress that we only have a single sample of each signal Sj, 
so are unable to verify any asymptotic estimation properties. 

The numerical results are in Table 1 and Figures 3-5, with the first of the 
figures showing the colour-coded principal eigenvector of the reconstruc¬ 
tion, the second showing the fractional anisotropy and principal eigenvec¬ 
tors, and the last one the errors in the latter two, in a colour-coded manner. 
All plots are masked to represent only the non-zero region. The field of 
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Table 1: Numerical results for the synthetic data. For the linear and non¬ 
linear the free parameter chosen by the parameter choice criterion is 
the regularisation parameter n, and for the constrained problem it is the 
confidence interval. 


Method 

Parameter choice 

Frobenius 

PSNR 

Pr. e.val. 
PSNR 

Pr. e.vect. 
angle PSNR 

Regression 


33.90dB 

25.04dB 

47.86dB 

Linear 

Discr. Principle 

32.93dB 

27.81dB 

61.89dB 

Linear 

Frob. Error-optimal 

34.51dB 

28.42dB 

60.93dB 

Non-linear 

Discr. Principle 

37.33dB 

27.81dB 

61.89dB 

Non-linear 

Frob. Error-optimal 

37.44dB 

28.03dB 

61.12dB 

Constraints 

90% 

32.28dB 

28.86dB 

65.65dB 

Constraints 

95% 

30.97dB 

28.14dB 

64.80dB 

Constrains 

99% 

27.86dB 

24.51dB 

61.41dB 


fractional anisotropy is defined for a field u of 2-tensors on c as 

FA„(i) = (E”i(A. - 1 Af)^*''" e |0.1], {X e fi), 

with Ai,..., Arn denoting the eigenvalues of u{x). It measures how far the 
ellipsoid prescribed by the eigenvalues and eigenvectors is from a sphere, 
with FA^(x) = 1 corresponding a full sphere, and FA^i(x) = 0 correspond¬ 
ing to a degenerate object not having full dimension. 

As we can see, the non-linear approach (3.2) performs overall the best 
by a wide margin, in terms of the pointwise Frobenius error, i.e., error in 
II • This is expressed as a PSNR in Table 1. What is, however, interesting, 

is that the constraint-based approach (3.5) has a much better reconstruction 
of the principal eigenvector angle, and a comparable reconstruction of its 
magnitude. Indeed, the 95% confidence interval in Figure 3(g) and Figure 
4(g) suggests a nearly perfect reconstruction in terms of smoothness. But, 
the Frobenius PSNR in Table 1 for this approach is worse than the simple 
unregularised inversion by regression. The problem is revealed by Figure 
5(f): the large white cloudy areas indicate huge fractional anisotropy errors, 
while at the same time, the principal eigenvector angle errors expressed in 
colour are much lower than for other approaches. Good reconstruction of 
the principal eigenvector is important for the process of tractography, i.e., 
the reconstruction of neural pathways in a brain. One explanation for our 
good results is that the regulariser completely governs the solution in areas 
where the error bounds are inactive due to generally low errors. This re¬ 
sults in very smooth reconstructions, which is in the present case desirable 
as our synthetic tensor field is also smooth within the helix. 
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(a) Ground-truth (b) Regression re- (c) Linear dis- (d) Linear 
suit crepancy principle error-optimal 



(e) Non-linear (f) Non-linear (g) Constrained, (h) Constrained, 
discrepancy prin- error-optimal 95% confidence 90% confidence 
ciple intervals intervals 

Figure 3: Synthetic test data results, (a) ground-truth plot. 

(b) regression result plot, (c)-(h) Plot of a slice of the solution 
for L^, non-linear and constrained models. The legend 
on the right indicates the colour-coding of directions of the 
principal eigenvector plotted. 


5.3 Results with in vivo brain imaging data 

We now wish to study the proposed regularisation model on a real in- 
vivo diffusion tensor image. Our data is that of a human brain, with the 
measurements of a volunteer performed on a clinical 3T system (Siemens 
Magnetom TIM Trio, Erlangen, Germany), with a 32 channel head coil. A 
2D diffusion weighted single shot EPI sequence with diffusion sensitising 
gradients applied in 12 independent directions {b = lOOOs/mm^). An ad¬ 
ditional reference scan without diffusion was used with the parameters: 
TR = 7900ms, TE = 94ms, flip angle 90°. Each slice of the 3D data set 
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(a) Ground-truth (b) Regression re- (c) Linear dis- (d) Linear 
suit crepancy principle error-optimal 



(e) Non-linear (f) Non-linear (g) Constrained, (h) Constrained, 
discrepancy prin- error-optimal 95% confidence 90% confidence 
ciple intervals intervals 


Figure 4: Fractional anisotropy in greyscale superimposed 
by principal eigenvector. Legend on left indicates the 
greyscale intensities of the fractional anisotropy. 



-> 

1 


has plane resolution 1.95mm x 1.95mm, with a total of 128 x 128 pixels. 
The total number of slices is 60 with a slice thickness of 2mm. The data set 
consists of 4 repeated measurements. The GRAPPA acceleration factor is 2. 
Prior to the reconstruction of the diffusion tensor, eddy current correction 
was performed with FSL [47]. Written informed consent was obtained from 
the volunteer before the examination. 

For error bounds calculation according to the procedure of Section 5.1, 
to avoid systematic bias near the brain, we only use about 0.6% of the total 
volume near the borders, or roughly k ^ 6000 voxels. 

To estimate errors for the all the considered reconstruction models, for 
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(a) Regression re- (b) Linear dis- (c) Linear (d) Non-linear 

suit crepancy principle error-optimal discrepancy prin¬ 

ciple 





(e) Non-linear (f) Constrained, (g) Constrained, 

error-optimal 95% confidence 90% confidence 

intervals intervals 


Figure 5: Colour-coded errors of fractional anisotropy and 
principal eigenvector for the computations on the synthetic ^ 

test data. Legend on the right indicates the colour-coding ^ j| 
of errors between u and as functions of the principal o° 
eigenvector angle error 9 = cos~^{{vu->^gQ)) in terms of the 
hue, and the fractional anisotropy error epA = |FA^ — FA^q | 
in terms of whiteness. 


each gradient direction bi we use only one out of the four duplicate mea¬ 
surements. We then calculate the errors using a somewhat less than ideal 
pseudo-ground-truth, which is the linear regression reconstruction from all 
the available measurements. 

The results are in Table 2 and Figures 6-8, again with the first of the 
figures showing the colour-coded principal eigenvector of the reconstruc- 
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Table 2: Numerical results for the in-vivo brain data data. For the and 
non-linear reconstruction models the free parameter chosen by the pa¬ 
rameter choice criterion is the regularisation parameter n, and for the con¬ 
strained problem it is the confidence interval. 


Method 

Parameter choice 

Frobenius 

PSNR 

Pr. e.val. 
PSNR 

Pr. e.vect. 
angle PSNR 

Regression 


32.35dB 

33.67dB 

28.56dB 

Linear 

Discr. Principle 

34.80dB 

36.35dB 

24.81dB 

Linear 

Frob. Error-optimal 

34.81dB 

36.32dB 

24.97dB 

Non-linear 

Discr. Principle 

33.53dB 

35.87dB 

27.12dB 

Non-linear 

Frob. Error-optimal 

33.57dB 

36.03dB 

27.58dB 

Constraints 

90% 

33.71dB 

34.93dB 

27.00dB 

Constraints 

95% 

33.70dB 

34.97dB 

26.91dB 

Constraints 

99% 

33.67dB 

34.89dB 

26.88dB 


tion, the second showing the fractional anisotropy and principal eigenvec¬ 
tors, and the last one the errors in the latter two, in a colour-coded manner. 
Again, all plots are masked to represent only the non-zero region. In the fig¬ 
ures, we concentrate on error bounds based on 95% confidence intervals, as 
the results for the 90% and 99% cases do not differ significantly according 
to Table 2. 

This time, the linear approach (3.3) has best overall reconstruction 
(Frobenius PSNR), while the nonlinear approach (3.2) has clearly the 
best principal eigenvector angle reconstruction besides the regression, which 
does not seem entirely reliable regarding our regression-based pseudo-ground- 
truth. The constraints based approach (3.5), with 95% confidence intervals 
is, however, not far behind in terms of numbers. More detailed study of the 
corpus callosum in Figure 8 (small picture in picture) and Figure 7 however 
indicates a better reconstruction of this important region by the nonlinear 
approach. The constrained approach has some very short vectors there 
in the white region. Naturally, however, these results on the in vivo data 
should be taken with a grain of salt, as we have only a somewhat unreliable 
pseudo-ground-truth available for comparison purposes. 

5.4 Conclusions from the numerical experiments 

Our conclusion is that the error bounds based approach is a feasible alter¬ 
native to standard modelling with incorrect Gaussian assumptions. It can 
produce good reconstructions, although the non-linear approach of [52] 
is possibly slightly more reliable. The latter does, however, in principle de¬ 
pend on a good initialisation of the optimisation method, unlike the convex 
bounds based approach. 

Further theoretical work will be undertaken to extend the partial-order- 
based approach to modelling errors in linear operators to the non-lattice 
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(a) Pseudo-ground-tmth (b) Regression result (c) Linear discrepancy 

principle 



(d) Linear error- (e) Non-linear discrep- (f) Non-linear error- 

optimal ancy principle optimal 



(g) Constrained, 95% con¬ 
fidence intervals 


Figure 6: Reconstruction results on the in vivo brain data, (a) Pseudo¬ 
ground-truth plot, (b) regression result, (c)-(g) Plot of a slice of the solution 
for non-linear and constrained problem approach. The legend on 
the bottom-right indicates the colour-coding of directions of the principal 
eigenvector plotted. 


case of the semidefinite partial order for symmetric matrices, which will 
allow us to consider problems of diffusion MRI with errors in the forward 
operator. 

It shall also be investigated whether the error bounds approach needs to 
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(d) Non-linear dis- (e) Non-linear error- (f) Constrained, 95% con- 
crepancy principle optimal fidence intervals 

Figure 7: Fractional anisotropy of the corpus callosum in 
greyscale superimposed by principal eigenvector. Legend M . 

on the right indicates the greyscale intensities of the frac- o fa i 

tional anisotropy. 


be combined with an alternative, novel, regulariser that would ameliorate 
the fractional anisotropy errors that the approach exhibits. It is important 
to note, however, that from the practical point of view, of using the recon¬ 
struction tensor field for basic tractography methods based solely on prin¬ 
cipal eigenvectors, these are not that critical. As pointed out by one of the 
reviewers, the situation could differ with more recent geodesic tractogra¬ 
phy methods [?, ?, ?] employing the full tensor. We provide basic principal 
eigenvector tractography results for reference in Figure 9, without attempt¬ 
ing to extensively interpret the results. It suffices to say that the results look 
comparable. With this in sight, the error bounds approach produces a very 
good reconstruction of the direction of the principal eigenvectors, although 
we saw some problems with the magnitude within the corpus callosum. 
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(a) Linear discrepancy (b) Linear error- (c) Non-linear discrep- 

principle optimal ancy principle 



(d) Non-linear error- (e) Constrained, 95% con- 
optimal fidence intervals 


Figure 8: Colour-coded errors of fractional anisotropy and principal eigen¬ 
vector for the computations on the synthetic test data. Legend on the right 
indicates the colour-coding of errors between u and as functions of the 
principal eigenvector angle error 9 = cos~^{{vu,VgQ)) in terms of the hue, 
and the fractional anisotropy error epA = IFA^^ — FA^q| in terms of white¬ 
ness. 
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(a) Pseudo-ground-tmth (b) Regression result (c) Linear discrepancy 

principle 



(d) Non-linear dis- (e) Non-linear error- (f) Constrained, 95% con- 
crepancy principle optimal fidence intervals 


Figure 9: Tractography visualization results on the in vivo brain data, (a) 
Pseudo-ground-truth plot, (b) regression result, (c)-(f) Plot of a slice of the 
solution for non-linear and constrained problem approach. 
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Appendix: Notation and techniques 

We recall some basic, not completely standard, mathematical notation and 
concepts in this appendix. We begin with partially ordered vector spaces, 
following the book [43]. Then we proceed to tensor calculus and tensor 
fields of bounded variation and of bounded deformation. These are also covered 
in more detail for the diffusion tensor imaging application in [54]. 
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Banach lattices 

A linear space X, endowed with a partial order relation < is called an or¬ 
dered vector space if the partial order agrees with linear operations in the 
following way: 

x^y x + z^y + z \lx^y^z^X^ 

X ^ y =4> Xx ^ Xy Vx, y ^ X and A G M+. 

An ordered vector space is called a vector lattice if each pair of elements 
x^y ^ X have a supremum xM y ^ X and infimum x Ay ^ X. Sup remum 
of two elements x, ^ of a Banach lattice X is the element z = x\/ y with the 
following properties: z > x, z y and Vz G X such that z ^ x and z y 
we have z z. 

For any x G X, the element = x V 0 is called its positive part, the 
element x- = (—x) VO = (—3:^)+ is called its negative part, the element \x\ = 

+ X- is its absolute value. The equalities x = x^ — x- and \x\ = xV (—x) 
hold for any x G X. 

It is obvious that suprema and infima exist for any finite number of 
elements of a vector lattice. A vector lattice X is said to be order complete 
if any bounded from above set in X has a supremum. 

Let X and Y be ordered vector spaces. A linear operator U : X ^ Y is 
called positive, if x 0 implies Ux 0. An operator U is called regular, 
if it can be written as U = Ui — U 2 , where Ui and U 2 are positive operators. 

Denote the linear space of all regular operators X ^ y by iv""(X, y). A 
partial order can be introduced in iv""(X, y) in the following way: Ui > U 2 , 
if Ui — f /2 is a positive operator. If X and Y are vector lattices and Y is order 
complete, then iv""(X, y) is also an order complete vector lattice. 

A norm || • || defined in a vector lattice X is called monotone if |x| < \y\ 
implies ||x|| ^ ||^||. A vector lattice endowed with a monotone norm is 
called a Banach lattice if it is norm complete. If X and Y are Banach lattices, 
then all operators in iv""(X, y) are continuous. 

Let us list some examples of Banach lattices. The space of continuous 
functions where c is a Banach lattice under the natural point- 
wise ordering: / 9 if only if f{x) > g{x) for all x ^ 91. The spaces 

1 ^ p < 00 , are also Banach lattices under the following partial or¬ 
dering: f '^LP g ii and only if f{x) > g{x) almost everywhere in fi. With 
this partial order, Lp{91), 1 ^ p ^ 00 , are order complete Banach lattices. 
The Banach lattice of continuous functions C{91) is not order complete. 

Tensors in the Euclidean setting 

We call a fc-linear mapping A : x • • • x ^ M a A;-tensor, denoted 

A G This is a simplification from the full differential-geometric 

definition, sufficient for our finite-dimensional setting. We say that A is 


30 



symmetric, denoted A G Sym^(M’^), if it satisfies for any permutation tt of 
{1,..., /c} that 

• • • 5 ^7rfc) • • • 5 ‘ 

With ei,..., the standard basis of we define on (M^) the inner 
product 

{A^B) ,..., ep^)S(ep^,... , Cp^), 

and the Frobenius norm 


IIAIIf := ^(AA). 

The Frobenius norm is rotationally invariant in a sense crucial for DTI. We 
refer to [54] for a detailed discussion of this, as well of alternative rotation- 
ally invariant norms. 

Example .1 (Vectors). Vectors A G are of course symmetric 1-tensors, 
The inner product is the usual inner product in M^, and the Frobenius norm 
is the two-norm, \\A\\f = H^lb- 

Example .2 (Matrices). Matrices are 2-tensors: A{x^y) = {Ax,y), while 
symmetric matrices A = A^ are symmetric 2-tensors. The inner product 
is (A, B) = J2i j \\A\\f is the matrix Frobenius norm. 

We use the notation A > 0 for positive-semidefinite matrices A. One 
can verify that this relation indeed defines a partial order in the space of 
symmetric matrices: 

A> B iff A — S is positive semidefinite. (.4) 

With this partial order, the space of all symmetric matrices becomes an or¬ 
dered vector space, but not a vector lattice. However, it enjoys some prop¬ 
erties similar to those of vector lattices: for example, any directed upwards 
subset^ has a supremum [36, Ch.8]. 

Symmetric tensor fields of bounded deformation 

Let u : ft ^ Sym^(M’^) for a domain c We set 

\\u\\f,p ■■= {p € [l,oo)), 

and 

||w||f,oo := esssup3.gQ ||^/(a;)||i7', 

^Recall that an indexed subset {xr : r G {r}} of an ordered vector space X is called 
directed upwards if for any pair n, T2 G {r} there exists ts G {t} such that Xrg ^ Xn and 

Xq-^ ^ ^T2 • 
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The spaces Sym^(M’^)) are defined in the natural way using these 

norms, and clearly satisfy all the usual properties of spaces. 

In the particular case of matrices {k = 2), partial order can be introduced 
in the space LP(yt] Sym^(M’^)) in the following way: 

u^v mu{x) > v{x) almost everywhere in ft. (.5) 


Recall that the symbol > stands for the positive semidefinite order (.4) in 
the space of symmetric matrices. Since the positive semidefinite order is 
not a lattice, neither is the order (.5). 

If G Sym^(M’^)), A: > 1, we define by contraction the divergence 

div?i G Sym^“^(M’^) as 

m 

[div«(-)](ei2,...,eij := (e^i, V«(-)(e*i,..., e^J). (.6) 

il=l 


It is easily verified that div u{x) is indeed symmetric. Given a tensor field 
u G T^(]R’")) we then define the symmetrised distributional gradient 

Eu e by 

Eu{(p) := — [ (rt(a;),div(/j(a;)) dx, (y) G (7^(11; Sym^“'“^(]R"*))). 

Ja 


With these notions at hand, we now define the spaces of symmetric tensor 
fields of bounded deformation as (see also [54, 8]) 


BD(fl; Sym'=(M"*)) := |« G Sym'=(M"*)) 


sup^ 




Eu{y>) < oo|, 


where 

:= {p G C-(fl;Sym"(M-)) | ||p||^,oo < !}• 

For u G BD(i7; Sym^(]R’")), the symmetrised gradient Eu is a Radon mea¬ 
sure, Eu G Sym^''‘^(]R"*)). For the proof of this fact we refer to [20, 

§4.1.5], 

Example .3. The space BD(fl; Sym°(R"*)) agrees with the space BV(f2) of 
functions of bounded variation. The space BD(r2; Sym^(]R’")) = BD(i7) is 
the space of functions of bounded deformation of [48]. 
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